Final paper � MIT MAS962

Greg Detre

Tuesday, December 17, 2002

Abstract

The aim of this project was to investigate how spatial and temporal representations are related, and how this is reflected in language. By employing a 2-D grid-world, with objects moving through time, it was hoped that the analogies between spatial and temporal representations would become apparent. It was hoped that these representations would self-organise through learning, and that some of the evidence from cognitive psychology might be replicated.

 

Introduction

The aim of this project was to investigate how spatial and temporal representations are related, and how this is reflected in language. When we think about what we mean by �spatial representations�, we are talking ultimately about some ensemble of neurons whose activity is involved whenever we think in spatial, geometric terms. This very broad working definition is intended to cover any use of propositional concepts, visualisations, calculations and manipulations that involve spatial geometry. We might imagine that a parallel set of concepts, visualisations and manipulations exist for temporal geometry, correspondingly rooted in �temporal representations�.

We would like to be able to understand a representation in more detailed and expressive terms than simply picking out the implicated neurons. It might even be for some representations that almost the entire brain is implicated. Rather, we want to build a computational, or functional, model of a representation as a means of understanding it. That is, we wish to build a system whose outputs are the same as our biological system for the same inputs. In addressing our original question of how spatial and temporal representations are related, we really want to be able to quantifiably compare two models, of spatial and temporal representations, and then be able to understand where those similarities and differences lie.

 

Psychological evidence

We don�t know exactly which neurons are involved when we think about space. We have a broad idea, through neuroimaging studies, but it quickly becomes apparent that different neurons are implicated in different ways. A more promising route for the moment involves hypothesising about what sort of computational structures would give rise to observed properties. The most illuminating observed properties are those uncovered by evidence from linguistics, and from cognitive psychology.

We can see immediately just from an introspective survey that we often discuss analogous spatial and temporal concepts using the same, or very similar, words. For example, we don�t even notice the stretch involved when we say both:

The messenger went from Paris to Istanbul

and:

The meeting went from 3:00 to 4:00������������������������������������������������������� (Jackendoff)

or that we can say:

The spaceship is nearly here.

and:

Christmas is nearly here.

We can also look at the way we use timelines, based on an archetypal spatial metaphor. It has been long-established that in English, we use a horizontal timeline with the future to the right, whereas Mandarin Chinese speakers use a vertical timeline with the future stretching out below. Furthermore, Mandarin speakers answer simple time sequence questions more rapidly than English speakers when the objects on the screen move vertically (Boroditsky, 2001), implying that differences in their linguistic framework reveal differences in the underlying cognitive representations. More specifically, it seems almost as though the mapping from the one temporal to three spatial dimensions in Mandarin is orthogonal to the mapping involved for English speakers. This may be too strong a statement, because although we know that dimensionality must be represented somehow in order to produce the behaviour that is evident and required in any spatial reasoning, we really have no idea yet exactly what sort of function relates the temporal and spatial.

 

The simulation

A priority then in any model is to reveal some of the underlying similarity between the spatial and temporal representations, and to show how a spatial concept might be coopted into the temporal domain (or perhaps vice versa).

We started with a preliminary sketch of a simulation, involving only two spatial dimensions and a discrete temporal one. It was felt that almost all of the interesting behaviour could be captured without a third dimension, especially since the space is already being reduced to one. One agent and one object, taking the roles of landmark and target (or �trajector�), were all that was required, with the agent initially being rooted in the spot. The object appears at random locations on the edge of the grid-world, and moves in straight line towards, through and past the agent until reaching the boundary, and being repositioned elsewhere. The object moves at a different rate every time it is repositioned. At every timestep, each agent utters a string expressing its understanding of the object�s relation to it, e.g. �front near left� or �behind far�.

We chose to try to capture spatial and temporal concepts as algorithms, given some local coordinate information about a target, that could be broken down into parsable formulae. For instance, the spatial concept of �inFront� amounts to a simple comparison between the �y� coordinates of the landmark and trajector:

inFront: landmarkY > targetY

To begin with, the landmark coordinates are always fixed to (0, 0), to reflect the agent�s egocentric view of the world, though eventually the landmark coordinates can take some non-origin value when calculating the spatial concept for some external position.

I employed a Polish notation, to make the parsing of such formulae as a string easier, in which the same function would look like this:

inFront: boolABiggerThanB functLandmarkY functTargetY

Where �boolABiggerThanB� is a self-explanatory boolean function and the other two evaluate to the two �y� coordinates. I employed the following primitives:

Tests:

���������������� >=ANDORNOT

 

Functions:

���������������� +-*/^2sqrt

Variables:

targetX

targetY

landmarkX

landmarkY

number (some arbitary input number)

We can use this algorithmic representation to compare the spatial and temporal senses of �near�:

 

boolNearButNotHere

������� << "boolAndPQ"

���������������� << "boolABiggerThanB"

������������������������ << �3"

������������������������ << "functDistanceToTarget"

���������������� << "boolNotP"

������������������������ << �boolAtSamePos"

boolImpactSoonButNotYet

������� << "boolAndPQ"

���������������� << "boolABiggerThanB"

������������������������ << �3"

������������������������ << "functWillImpactIn"

���������������� << "boolNotP�

������������������������ << �boolAtImpactNow�

 

This calls a number of further functions. �FunctDistanceToTarget� evaluates to the Euclidean distance, and �functWillImpactIn� evaluates how many timesteps will occur before the object is at the same position as the agent, given its current rate and direction. The similarity between the two representations is clear, but seems rather contrived. We might imagine similar other analogues, between �in 3 spacesteps� and �in 3 timesteps�, where �in� is true whenever the object is within some arbitrary number of steps plus or minus x.

Learning

In order to feel that the model was generating these isomorphisms itself, rather than being fed them, we devised two learning algorithms, one a symbolic search, and the other connectionist. Unfortunately, at the time of writing, neither is yielding useful results, and their relative success is therefore inconclusive.

The symbolic search employs an iterated depth-first algorithm to try out every possible algorithm-string up to a given length to find the most efficient that matches all of the utterances it has heard to corresponding positions of the object. It will inevitably find correct, efficient solutions if they exist for strings of a given length, and we can broadly measure the distance between different strings by seeing how many permutations it takes to get from the most efficient form of one to the other.

The second learning method measures employs a backpropagation network, which takes the landmark and trajector coordinate information and uses the utterances the agent hears as the target, so that it should learn to produce the right utterances for any given landmark/trajector relation. The problem with connectionist approaches is that they require a fixed input/output length, and they can�t nest algorithms within each other to produce more powerful ones in the future. However, we might hope to see interesting, related patterns in the hidden weights that would show how temporal and spatial representations in independently trained nets were represented.

 

Future work

In the future, we would like to see how converting from egocentric to exocentric coordinates would affect the representations involved. We might implement this by feeding non-origin landmark coordinates to the algorithms, where the landmark represents another agent. More importantly, we would like to see how agent movement affects the situation, and brings out the ego- vs time-moving representations that Boroditsky (2002) discusses.

 

Conclusion

We were able to establish that this algorithmic representation of spatial and temporal concepts should be rich and flexible enough to build the representations that we want, but without the learning data, we were not able to show what we had hoped, namely that similarities in the structure of spatial and temporal representations could self-organise. We were also unable to show that utterances corresponding to these concepts, could self-organise between two agents utilising different learning methods.

 

References

Boroditsky, L. (2001). Does language shape thought? English and Mandarin speakers' conceptions of time. Cognitive Psychology, 43(1), 1-22.

Boroditsky, L. & Ramscar, M. (2002). The Roles of Body and Mind in Abstract Thought. Psychological Science, 13(2), 185-188.